Calculating potentials of mean force and diffusion coefficients from nonequilibirum 

processes without Jarzynski's equality 
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In general, the direct application of the Jarzynski equality (JE) to reconstruct potentials of mean 
force (PMFs) from a small number of nonequilibrium unidirectional steered molecular dynamics 
(SMD) paths is hindered by the lack of sampling of extremely rare paths with negative dissipative 
work. Such trajectories, that transiently violate the second law, are crucial for the validity of JE. 
As a solution to this daunting problem, we propose a simple and efficient method, referred to as 
the FR method, for calculating simultaneously both the PMF U(z) and the corresponding diffusion 
coefficient D(z) along a reaction coordinate z for a classical many particle system by employing a 
small number of fast SMD pullings in both forward (F) and time reverse (R) directions, without 
invoking JE. By employing Crook's transient fluctuation theorem (that is more general than JE) 
and the stiff spring approximation, we show that: (i) the mean dissipative work Wd in the F and R 
pullings are equal, (ii) both U(z) and Wd can be expressed in terms of the easily calculable mean 
work of the F and R processes, and (iii) D(z) can be expressed in terms of the slope of Wd- To test 
its viability, the FR method is applied to determine U(z) and D(z) of single-file water molecules in 
single- walled carbon nanotubes (SWNTs). The obtained U(z) is found to be in very good agreement 
with the results from other PMF calculation methods, e.g., umbrella sampling. Finally, U(z) and 
D(z) are used as input in a stochastic model, based on the Fokker-Planck equation, for describing 
water transport through SWNTs on a mesoscopic time scale that in general is inaccessible to MD 
simulations. 



I. INTRODUCTION 



The study of the structure-function relationship of 
large biomolecules often requires to follow their dynam- 
ics on a meso- or even macro-scopic time scale while re- 
taining its atomic scale spatial resolution. A typical ex- 
ample is molecular and ion transport through channel 
proteins^. While structural details of the inner lining of 
the channel in particular, and that of the protein-lipid- 
solvent environment in general, are needed at atomic res- 
olution in order to determine the forces that guide the 
diffusion of the particles across the channel, the duration 
of the permeation process may exceed by several orders 
of magnitude the time scale of several tens of nanosec- 
onds currently attainable by all atom molecular dynam- 
ics (MD) simulations 2 . In this case a simplified alter- 
native approach is to model the transported molecule in 
the channel as an overdamped Brownian particle that 
diffuses along the axis of the channel in the presence 
of an effective potential of mean force (PMF) that de- 
scribes its interaction with the rest of the atoms in the 
system- . A PMF is the Landau free energy profile along 
a reaction coordinate (RC), or order parameter^, and it 
can be determined from the equilibrium statistical dis- 
tribution function of the system by systematically inte- 
grating out all degrees of freedom except the RC£. In 
principle, both the effective diffusion coefficient and the 
PMF, quantities that enter the Langevin equation of mo- 
tion (or, equivalently, the corresponding Fokker-Planck 
equation^) which determines the dynamics of the trans- 
ported molecule, can be determined from MD simula- 
tions. In practice, however, the calculation of free energy 
differences and PMFs are rather difficult and computa- 



tionally expensive^. 

Since even the longest equilibrium MD (EMD) tra- 
jectories can sample only a small region of the RC do- 
main of interest, the one situated in the vicinity of the 
corresponding PMF minimum, simple EMD simulations 
are not suitable for PMF calculations. The traditional 
method for calculating PMFs by means of biased EMD 
simulations is umbrella sampling (US)£»£& However, US 
may become inefficient and computationally unaffordable 
when the number of required sampling windows becomes 
too large. This may happen when the amplitude of the 
equilibrium fluctuations of the RC is very small compared 
to the size of the RC interval in which the PMF is sought. 

In such cases the RC can be sampled efficiently by em- 
ploying steered molecular dynamics 10 (SMD) in which 
the system is guided (or steered), according to a prede- 
fined protocol, along the RC by using, e.g., a harmonic 
guiding potential (HGP). By choosing a sufficiently large 
value for the elastic constant of the HGP, i.e., within the 
stiff-spring approximationAi*A£ (SSA), the distance be- 
tween the target and actual value of the RC at a given 
time can be kept below a desired value. In general, for 
a large system (~ 10 5 atoms) computationally one can 
afford only a limited number (typically < 10) of such 
nonequilibrium SMD pullings, and the real challenge is 
to find a way to reconstruct the PMF (at least semi- 
quantitatively) along the RC using this limited amount 
of data. In principle, the equilibrium PMF can be re- 
constructed from the celebrated Jarzynski equality (JE) 
that relates the equilibrium free energy difference AF be- 
tween two states to the average of the external work 
W done along all nonequilibrium paths that connect 
those states and are subject to the preestablished RC 
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In terms of the dissipative work 
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W d = W — AF, JE can be written as (exp(— |3W d )} = 1 , 
where (3 = 1/1cbT, Icb is the Boltzmann constant and T 
is the temperature of the heat bath (environment). We 
note that if the SMD pulling occurs infinitely slowly then 
the system is in equilibrium at all times and W d = 
(reversible paths). Thus JE is trivially satisfied and 
W = W Tev = AF is the reversible work. In general 
SMD pullings are nonequilibrium with W d > along 
most of the trajectories. However, the validity of JE de- 
pends crucially on a small fraction of trajectories with 
W d < 0, that transiently violate the second law. Since 
such trajectories (whose number decreases exponentially 
with W d /k.BT) are very unlikely to occur among a few 
fast SMD pullings, it is clear that the sought PMF can- 
not be determined by the direct application of JE, ex- 
cept when the pulling paths are close to equilibrium (i.e., 
when Wd ^bT). Under near-equilibrium conditions, 
the validity of JE has been confirmed in an RNA stretch- 
ing experiment^. Also, JE has been successfully ap- 
plied for free energy calculations in computer simulation 
of small and/or simplified model systems. However, in 
spite of a large number of papers dedicated to the ap- 

plications Of JE 1 6 1 17,18 1 19,20 1 21,22 1 23 1 24 1 25 1 26 and other fluc _ 

tuation theorems^^, there are surprisingly few studies 
which use SMD simulations combined with the JE to cal- 
culate PMFs for large biomolecules 11 - 29 . 

The purpose of this paper is to propose a simple and 
efficient method, referred to as the FR method, for calcu- 
lating simultaneously both the PMF U(z) and the corre- 
sponding diffusion coefficient D(z) along a reaction coor- 
dinate z for a classical many particle system by employing 
a small number of fast nonequilibrium SMD pullings in 
both forward (F) and time reverse (R) directions, without 
invoking JE. In fact, as already mentioned, for such lim- 
ited number of processes JE fails to hold. The essence 
of the FR method, detailed in Sec. [H] can be summa- 
rized as follows: Several fast F and R SMD pullings 
are carried out within the SSA. The latter guaranties 
that (i) the RC follows closely its target value deter- 
mined by the pulling protocol, (ii) the change in PMF 
(AU) is well approximated by the corresponding change 
in the free energy (AF) of the system biased by the HGP, 
and (iii) the work distribution function Pf/r(W) along 
F/R paths is Gaussian. A few F and R SMD trajec- 
tories are sufficient to sample Pf/r(W) about its max- 
imum (see Fig. ^) and, therefore, determine approxi- 
mately the mean F/R work Wp/R. However, the same 
data is insufficient for even a rough estimate of the vari- 
ance a\y = W 2 — W , i.e., of the actual width of P F / R (W). 
From Crooks' transient fluctuation theorem*^ (TFT) [see 
Eq. (|f 6fl ]. which is more general than JE, follows that if 
Pp(W) is Gaussian then Pr(W) is also a Gaussian with 
the same variance (width) cx^, = 2ksT W^, and peak po- 
sition W R = Wp — 2AF. Thus, (i) the PMF is given by 
AU = AF = (W F — W R )/2, and (ii) the mean dissipa- 
tive work is the same for both F and R paths, given by 
W d = (W F + W R )/2. From W d the position dependent 
diffusion coefficient is D = kBTv/(dW d /dz), where v is 



the pulling speed. 

Thus, the reason why previous studies failed to recon- 
struct the PMF from unidirectional SMD pullings far 
from nonequilibrium by using JE is because such ap- 
proach requires the complete sampling of the correspond- 
ing work distribution function that is simply impossible 
to obtain from a limited number of pullings. (In fact, 
the sampling of P(W) has to be so complete that it must 
include paths with W d < as discussed above.) While 
the mean work can be easily estimated, breaking this 
up into the PMF and the mean dissipative work (i.e., the 
heat exchanged with the environment) requires either the 
knowledge of the precise width (variance) of the F work 
distribution function (e.g., when the F SMD paths are 
close to equilibrium and W d is small) or additional in- 
formation that may come from a set of R SMD pullings, 
as outlined above. The solution to this problem offered 
by our FR method is surprisingly simple, however, its 
validity depends crucially on Crooks' TFT (from which 
JE can be derived) and the Gaussian nature of Pf/r(W) 
guaranteed by SSA. In particular, the conclusion that the 
mean dissipative work is the same for both F and R SMD 
paths is highly non trivial. 

The remaining of the paper is organized as follows. In 
Sec. |n] we describe in detail the theoretical basis of our 
proposed FR method. In order to test its efficiency and vi- 
ability, in Sec. IIIII we apply the FR method to calculate 
the PMF and the position dependent diffusion coefficient 
of water molecules moving across densely packed single 
walled carbon nanotubes (SWNT) that connect two wa- 
ter reservoirs. The obtained PMF is compared with the 
ones obtained from EMD and US simulations. Finally, 
conclusions are drawn in Sec. IIVI 



II. THEORY 

We consider a classical many particle system (e.g., a 
channel protein in a fully solvated lipid bilayer) described 
by the Hamiltonian Ho(F), where V = {r,p} represents 
the coordinates and momenta of all the atoms in the sys- 
tem. The dynamics of the system may be either deter- 
ministic or stochastic, but we assume that the conditions 
for which JE and TFT hold are met, i.e., the dynamics 
are Markovian and preserve the equilibrium ensemble, 
and the energy of the system is finite^,. These condi- 
tions are met in MD simulations in both NVT and NpT 
ensembles^. 



A. Reaction coordinate and PMF 

In general, any PMF calculation starts with the iden- 
tification of a properly chosen RC whose change in time 
describes the evolution of the state of the system^. For 
example, in describing the progression of a transported 
molecule in a nanopore (e.g., channel protein or SWNT) 
a proper RC is the projection of the COM of the molecule 
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FIG. 1: (Color online) Gaussian forward (long-dashed), re- 
verse (dotted) and dissipative (solid) work distribution func- 
tions within the stiff spring approximation. The shaded re- 
gion in P F / R (W) is the one sampled in F/R SMD pullings. 
The tail region of Pd(VV) corresponding to negative dissipa- 
tive work is also highlighted. 



(or of a part of the molecule) on either the permeation 
direction across the membrane, hereafter denoted by z, 
or on the axis of the pore. If the channel is relatively 
straight then the difference between the two RC choices 
can be neglected. For simplicity, here we assume that 
this is always the case. 

By definition, the PMF U(z) is determined from the 
equilibrium distribution function of the system by inte- 
grating out all degrees of freedom except the reaction 
coordinate z, i.e.^ 



e-P u < z »^p (z) 



ar- 



>-|3H (n 



-6[z-2(r)] 



(i) 



where po(z) is the equilibrium distribution function of 
the reaction coordinate, Zo is the partition function and 
6(z) is the Dirac-delta function whose filtering property 
guarantees that the integrand in Eq.Q is nonzero only 
when the RC has the desired value, i.e, when z(T) = z. 
Hereafter we use the convention that z [or z(t)] represents 
the target value of the RC while £ = £(T) represents the 
actual value of the RC. Also, unless otherwise stated, 
throughout this paper the energy is measured in units of 
IcbT, e.g., in Eq. J3> one needs to set (3 = 1. 

In principle, the equilibrium distribution function 
po(z) can be easily computed from EMD simulations, 
since it is proportional to the logarithm of the binned 
histogram of the RC sampled along the MD trajectory. 
Thus, the PMF is readily given by 



U(z) =-log[ Po (z) 



(2) 



In terms of the U(z) the equilibrium average of any func- 
tion f (£) of the RC can be calculated as 



(f(Z)>o = 



-H (n 



ar- 



-f(£) 



Zo 

dze- u(z) f(z) = 



dz6[z-z(H] (3) 
dzp (z)f(z) . 



In practice, however, even the longest EMD trajecto- 
ries sample only a restricted region of the reaction coor- 
dinate domain of interest (i.e., within the vicinity of the 
PMF minimum) and the direct application of Eq. is 
impractical. 



B. Harmonic guiding potential 

In order to properly sample energetically more diffi- 
cult to reach regions, one needs to guide or steer the 
system towards those regions by employing, e.g., a har- 
monic guiding potential (HGP) 



V s (£) = V(£(r)|z) = jW) 



(4) 



where k = k z is the stiffness (elastic constant) of the 
HGP. With this extra potential energy, the Hamiltonian 
of the new biased system becomes H z = Ho + V z (£). As 
a result, atom "j" in the selection that define the RC will 
experience an additional force 



P av z 3£(n 

Fj = —r— = -k[z r - zj — 

01*j Ol-j 



(•5) 



Thus, the HGP wm force the system to evolve in the 
configuration space in such a way that at all times £ stays 
confined in the vicinity of z. 

The free energy difference 6F Z = F z — Fo between the 
equilibrium states of the systems described by the Hamil- 
tonians H z and Ho can then be written as a Gaussian 
convolution of exp[— U(z)]. Indeed, 



,-SF. 



dr 



e -|3H (n 



-v 2 (£(r>) 



Zo 

dz'e- u(z ''e- v * (z,) 



,Vz(£) 



dz'e 



-u(z'; 



e 2 



(6) 

U-z') 2 



C. Stiff-spring approximation 

The sought PMF, U(z), can be obtained from Eq. © 
by Gaussian deconvolution of the free energy factor 
exp(— 6F Z ). However, it is more convenient to resort to 
the large k or stiff-spring approximationM^^ (SSA). 
Assuming that we seek to determine U(z) with a spatial 
resolution 5z, by choosing the spring constant such that 
k ^> 2/(6z) 2 one can easily see that in Eq. |J?J| the main 
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contribution to the last integral comes from the region 
|z — z'| <C 6z, and therefore one can write 



-6F Z 



-U(z) 



dz'e _ * ( 



-Ufzl 



(7) 



Now, taking the logarithm of both sides in Eq. J7J, one 
obtains 6F Z = F z — Fq = U(z] + const and, therefore, 



AU = U(z)-U(z ) 



AF = F z - F z 



(8) 



Thus, within the SSA the PMF of the unbiased system 
is well approximated by the free energy difference of the 
system biased by the HGP. Note that in SMD simula- 
tions, to make sure that the distance between the tar- 
get z(t) and actual z values of the RC on average stays 
smaller than the desired 6z, one needs to chose the spring 
constant according to 



k > max 



2a 2U r 



(6z) 2 ' (6z) 2 



(9) 



where U max is the highest PMF barrier one wants to 
explore, and a 1 . 



D. PMF from umbrella sampling and WHAM 



! J - the range of RC values 
is divided into N w sampling 



In umbrella sampling 5 
of interest (z min ,z max 
windows centered about conveniently chosen values zt, 
i = 1,...,N W . Next, the reaction coordinate is sam- 
pled in each window separately by preparing identical 
replicas of the system and applying the harmonic guid- 
ing potential V Zi (z). As a result, the biased distribu- 
tion functions can be readily obtained by direct sam- 
pling of the reaction coordinate for the biased system, 
i.e, Pt(z) = (Zo/Z-jJ e~ Vi(z) po(z), where, for brevity, the 
index z\ has been replaced by i. By inverting this equa- 
tion, the equilibrium distribution in each window can be 
expressed in terms of the biased distribution of the re- 
action coordinate. The standard method for efficiently 
stitching together the biased pi(z)'s in order to obtain 
the equilibrium po(z), and therefore the sought PMF, 
is the so called weighted histogram analysis method or 
WHAM 32 , according to which 



Polzj 



dzp (z)e 



-Vt(s) 



(10a) 



(10b) 



with Ni the number of data points used to construct 
Pi(z). The above non-linear coupled WHAM equations, 
that need to be solved iteratively, minimize the errors in 
determining po(z). When applicable, US combined with 
WHAM is perhaps the best choice for calculating PMFs. 
In practice, however, one often encounters situations in 
which the minimum number of US windows required to 



properly cover the range of RC values of interest is ex- 
cessively large and the application of the method may 
become computationally unattainable. Molecular trans- 
port in channel proteins is a good example. 



E. SMD, Transient Fluctuation Theorem and the 
Jarzynski equality 

In SMD simulations^ ", where initially the system is 
in an equilibrium state characterized by z(0), the target 
value of the RC z(t) (also referred to as control param- 
eter) is varied in time according to a prescribed proto- 
col. For example, in constant velocity SMD (cv-SMD) 
z(t) = z(0)+vt, < t < t, where v is the constant pulling 
speed equal to the ratio of the total pulling distance to 
the desired simulation time. We refer to the SMD pulling 
paths of the system when t increases from to x as for- 
ward (F) paths. The time reverse (R) pulling paths are 
obtained by starting the system from an equilibrium state 
corresponding to z(t) and reversing the sign of t in z(t) 
for F paths. In our cv-SMD example, this amounts to set- 
ting ZR(t) = zf(t — t) = z(t) — vt, < t < t. The choice 
of a sufficiently large spring constant (see Sec III C|l in the 
now time dependent HGP [Eq. (0J] guarantees that the 
instantaneous RC, z(t), follows closely the target value 
z(t) during the pulling process. Thus, cv-SMD is a fast 
sampling method of the RC by driving the system out of 
equilibrium. The faster the pulling the more significant 
is the deviation from equilibrium. The work done during 
a cv-SMD simulation is given by 



W t = W z 



z(t) 



dz[dV z (z)/dz] =k 



z(t) 



dz(z 



(11) 

Crooks has shown that under rather general condi- 
tions, listed at the beginning of this section, the following 
noncquilibrium fluctuation theorem holds 30 



(f(W)e- w ") ? Hf(-W)) 



R ' 



(12a) 



or 



<f(W)) F = (f(-W)e- w -) R . (12b) 
Here f (W) is an arbitrary function of the work W, and 



(...) 



F/R 



dWP F/R (W) 



(13) 



represents the average over forward/reverse paths or, 
equivalently, the average with respect to the for- 
ward/reverse work distribution functions Pf/r(W). The 
dissipative work in a F/R process is given by 



W dF/R = W F/R T AF , 



(14) 



with AF = F Z ( T ) — F z (o). The JE follows immediately from 
Eqs. (|12|) by setting f(W) = 1, and it can be written in 
any of the following forms 



(exp(-W dF )) F = (exp(-W dR )) R = 1 , 



(15a) 
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AF 



exp(-W)) F = e- Ah , (exp(-W)) R = e 



Another important equality, that connects the F and 
R work distribution functions, can be derived from 
Eqs. by setting f(W') = 6(W — W) and carrying 
out the integral with respect to W. The result is Crooks' 



transient fluctuation theorei 
Pf(W) 



,30 



(TFT) 



- „w dF 



Pr(-W) 



by Hummer and SzaboSLS, and indirectly by Crooks 30 . 
The nonequilibrium fluctuation theorem due to Crooks 

(15b) 

can also be written as 

<f[z(t)] exp(-W d )) F = (f [z R (0)]) R = (f[i(t)]) eq (19) 

where zj>(t) represents the time evolution of the control 
parameter during reverse pullings, f[z] is an arbitrary 
function and the index "eq" means the equilibrium aver- 
age corresponding to the biased system with Hamiltonian 
( 16 ) H Zo . By inserting f [z] = 6(z-z) into Eq. (T!3$ one obtains 



This equation is used to derive our new results in 

sec. mm 



F. PMF from unidirectional SMD and the 
Jarzynski equality 

An increasingly popular alternative for calculat- 
ing PMFs is based on the application of the 
JE from repeated unidirectional nonequilibrium SMD 
simulation^ WW™ i 22 i 23 i 31 . Within the SSA the 

sought PMF can be readily obtained from Eqs. (jSJ and 



AU(z) w AF = -log (cxp(-W z )) f 



(17) 



Here the index F indicates that the average is taken over 
the ensemble of forward pulling paths. As already men- 
tioned, the average of the exponential in Eq. (|17fl cannot 
be estimated reliably even for a reasonably large number 
of SMD pullings, unless the pulling speed is sufficiently 
small so that the system is close to equilibrium along 
the pulling paths. This is due to the fact that the over- 
lap between exp(— W) and the sampled part of Pp(W) is 
in general exponentially small. Nevertheless, there exist 
two approaches that in principle may give fairly good es- 
timates of Eq. I|17f) . provided that the system is not too 
far from equilibrium during pullings. The first method is 
the cumulant approximationiiiSi^i, according to which 



AU(z) = -log(exp(-W z )> 

o- 2 = wF-w? , 



W z -oi/2, (18a) 
(18b) 



where for simplicity we have dropped the index "F" and 
cr 2 is the variance (2nd cumulant) of the work, ft has 
been shown that within SSA the work distribution func- 
tion Pf(W) is Gaussian, and therefore generally recog- 
nized that in this case the cumulant approximation H18(l 
in fact is exact. However, as mentioned in Sec. [I] the 
reason why in practice Eq. (|18fl is valid only close to 
equilibrium is because SMD pulling paths can sample 
only a narrow region about the peak of the Gaussian 
Pp(VV). This allows for a fairly accurate determination 
of the mean work (W z ) but, in general seriously under- 
estimates the variance cr 2 . 

The second method for evaluating the average in 
Eq. I|17|) is a weighted histogram approach suggested 



(6(z-z)e- w *') F = A(6( z -2)e- v >'( £ >^ (20) 



-V„,(r) 



-U(z) 



Since the equilibrium average (exp(— V Zo )) correspond- 
ing to the unbiased system contributes only an additive 
constant to the PMF, from Eq. i|20[) one obtains the fol- 
lowing result 



U(z) =-log(6(z-z)exp(-AW z /)) 



(21a) 



where 
AW Z , 



W z , -V z ,(z) =k 
-ifz'M-zMl 2 



dTZ'(T)[z'(T) -Z(T) 



(21b) 



Thus, U(z) can be calculated from the work time se- 
ries obtained in repeated cv-SMD simulations by con- 
structing a weighted histogram of the RC according to 
Eqs. This method resembles to the US and WHAM 
and is preferable to the cumulant approximation method 
whenever we have a large number of pulling paths. How- 
ever, in the case of large systems when only a limited 
number of trajectories can be sampled this method is 
inapplicable because of insufficient data. 



G. PMF from forward and reverse SMD pullings 
with a stiff spring 

In this section we present our new method for calcu- 
lating PMFs from few fast SMD pullings along the RC in 
both F and R directions, hereafter referred to as the FR 
method. We assume that the pullings are done with a suf- 
ficiently stiff spring such that the SSA holds fSec. Ill CTfl . 
In this case, the F work distribution Pf(VV) is Gaussian, 
and according to Crooks' TFT (jTBj) it follows that the R 
work distribution Pr(W) is also Gaussian. Thus one can 
write 



PF/R(W) = (27tcx F/R ) 7 exp 



(W-W F/R ) 2 

2CT F/R 



(22) 
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where Wf/R and crp^ R are the mean work and variance 
corresponding to the F and R pulling directions, respec- 
tively. The mean dissipative work in the two distinct 
pulling directions is 



W 



d F/R 



dW(W± AF)P F/R (W) =W F/R ± AF . (23) 



Inserting 122(1 into (|16|l and taking into account that 
W dF —W— AF, after little algebra it follows that TFT 
can hold only if 



Wf+Wr 



and 



AF = (W F - W R )/2 . 



(24a) 



(24b) 



Finally, inserting Eq. I|24a|) into (|23[1 . one finds that the 
mean dissipative work is the same in both F and R pulling 
directions, i.e., 



W d = W dF = W dR = (W F + W R )/2 



(24c) 



Equations l|24|l are the key formulas of our FR method 
for calculating PMFs from fast F and R SMD pullings. 
Assuming that a few (~ 10) such SMD pullings can sam- 
ple reasonably well the work about the peak position 
W F /r of Pf/r(W), as indicated by the shaded regions 
in Fig.^ then Eqs. (|24l) yield essentially with the same 
degree of accuracy both the desired PMF, AU ss AF, 
and the mean dissipative work, W d . This feature makes 
the proposed method superior to the currently used ap- 
proaches described in the previous sections. In fact, 
these other methods can only determine the mean total 
work Wp with some statistical correction either through 
the cumulant approximation or a weighted histogram 
method. Furthermore, since it is reasonable to assume 
that W d is proportional to the pulling speed v, one can 
readily determine the position dependent friction coeffi- 
cient y(z) from the slope of the mean dissipative work 
y(z) — (dW d (z)/dz) /v. Then, the corresponding diffu- 
sion coefficient is given by the Einstein relation (in kp;T 
energy units) 



D(z) = y(z)^ = v(dW d (z)/dz) 



(25) 



Now that both U(z) and D(z) are determined, the equa- 
tion of motion of the RC on a meso (or macro) time 
scale is given by the Langevin equation corresponding to 
an overdamped Brownian particle 3 



Y(z)z = -dU(z)/dz + £(t) , 



(26a) 



or equivalently, the corresponding Fokker-Planck equa- 
tion for the probability distribution function p(z, t) of 
the RC 

3 t p(z,t) = -3 z j(z,t) = 9 z D(z)9 zP (z,t)+9 z U'(z)p(z,t) , 

(26b) 



where £,(t) is the Langevin force (modeled as a Gaussian 
white noise) and j(z, t) is the probability current density. 

We emphasize again that far from equilibrium the vari- 
ance cr^, = cr z of the F/R work calculated from SMD 
pullings data [cf. I|18fl ] is m general much smaller than 
the variance cr 2 of the actual work distribution function, 
and therefore it cannot be used to estimate even approx- 
imately the mean dissipative work, unless an exponen- 
tially large number of SMD trajectories are collected and 
used for this purpose. 

Finally, we note that Pf(W) and Pr(W) are identical 
Gaussians centered about Wp and Wr, respectively. One 
can also define a distribution function for the dissipative 
work through P d (W) = Pf(W + AF) = P R (W - AF), 
which is centered about W d (Fig. QJ. This allows us 
to calculate the fraction of the SMD trajectories that 
violates the second law, i.e., for which W d < 0; these 
trajectories are crucial in establishing the validity of the 
JE. We have 



(e- w ")lw d <o = 



-w 



dWP d (W)e 
1 c /W/ 2N \ exp(-W d ) 



(27) 



= jerfc (W d /2 ) - 



W 



1/2 



which clearly indicates that for W d > 1 (i.e., W d > kpjT 
in SI units) the number of such trajectories is exponen- 
tially small, and finding any of them in SMD simulations 
of large biomolecules is rather unlikely. 



H. Generalized acceptance ratio method 

The idea of combining results from both F and R sim- 
ulations is not new, dating back to the original Bennett's 
acceptance ratio method 3 - . However, in previous such 
studies^^^MS^i*^ the focus was mainly on determin- 
ing the free energy difference between two states and 
to estimate the corresponding error, unlike in our FR 
method in which the PMF, the mean dissipative work and 
the corresponding diffusion coefficient are determined si- 
multaneously from specially designed F and R pullings 
with Gaussian distributed work. For example, starting 
from the nonequilibrium fluctuation theorem (|I2|) and 
following the general philosophy of the Bennett accep- 
tance ratio method, Crooks has shownSI that the best 
estimate (i.e., with smallest error) of the free energy [see 
Eqs. 0] 



-AF 



<f(W)) F /(f(-W)e- w ) 



(28) 



is obtained by choosing the f(W) = 1/[1+tlf/tir exp(W— 
AF)], where tt-f/r represent the number of F/R paths 
sampled. Essentially the same result was derived by 
Pande and collaborators^ by applying the maximum 
likelihood estimator (MLE) method to Crooks' TFT (|Po|l . 
Thus, the best estimate of the free energy difference AF 
between two equilibrium states corresponding to the RCs 
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z(0) and z(t) is given by the solution of the following 
transcendental equation 



1 



i=1 



1 + n F /n R exp( W Fi - AF) 



(29) 



i 



1 + n R /n F exp(-W Ri - AF) 



. 



In order to calculate the PMF U(z) along the RC, z, 
by using the above MLE method, first, one needs to di- 
vide the domain of interest {z m t n , z max } into N inter- 
vals determined by the division points Zi, i = 0,...,N. 
Then the system needs to be steered into these point 
via SMD, and in each of them it needs to be equili- 
brated. Then, depending on the available computational 
resources, a well defined number of F and R cv-SMD 
pullings should be carried out between adjacent division 
point, each time starting from a different equilibrium con- 
figuration. Finally, solving Eq. (|29|l within the SSA, one 
determines the change AU^ = LU — Ut_i along each seg- 
ment (Zi_i,ZjJ. Although, strictly speaking, the above 
methods that combine F and R SMD pullings can de- 
termine the free energy difference between initially equi- 
librated states, in practice we find that in many cases 
Eqs (|24bjl - (|24c|) give good results even between the divi- 
sion points zt (see Sec. IIIIB)l . This means that N does 
not need to be a large number, and therefore the compu- 
tational overhead due to the intermediate equilibrations 
can be significantly reduced. 



III. PMF OF WATER MOLECULES IN SWNT 

In this section we calculate the PMF that guides the 
translocation of water molecules across a periodic struc- 
ture of densely-packed SWNTs, as well as, the corre- 
sponding position dependent diffusion coefficient. The 
choice of this system as a testing ground for our FR 
method was motivated by the following. First, water 
filled SWNTs are nontrivial many particle systems com- 
prising thousands of atoms, yet they are easy to simu- 
late and the PMF of waters inside the SWNTs can be 
easily tuned by changing the Van der Waals interaction 
parameters between the carbon and water molecules 4 ^. 
Second, SWNTs are hydrophobic nanopores that can be 
regarded as simplified models for the much more com- 
plex channel proteins. Thus, they are ideal for testing 
new computational methods and hypothesis that later 
can be applied to protein channels. Finally, during the 
past few years, SWNT have been intensively studied 
through MD simulations revealing many interesting and 
surprising propertiesaaa2ii2dL^2ia2i22iai. In particular, 
these simulations revealed that hexagonally packed (6,6) 
SWNTs, with diameter of 8.1 A, spontaneously fill with 
a single file of water molecules when connecting two wa- 
ter reservoirs. Water molecules diffuse across the tubes 
in a concerted fashion, with a diffusion rate close to the 




FIG. 2: (Color online) Lateral (left) and top (right) view of 
the unit cell (a = 20 A, b = 23 A and c = 52.5 A) of the 
simulated water and SWNT system using Van der Waals and 
surface representations. Water molecules cross the SWNTs in 
single files. Figure rendered with VMD— . 



corresponding bulk value. This correlated motion can 
be described rather well with a continuous-time random 
walk (CTRW) model 4 ^. As an alternative to the CTRW 
model, here we propose a more general stochastic model 
in which the motion of each water molecule along the 
z— axis of a SWNT is characterized by an effective (po- 
sition dependent) diffusion coefficient D(z) and a PMF, 
U(z). Both quantities can be determined efficiently and 
simultaneously by our FR method. 

We consider a periodic system (see Fig. 01 of 4 
hexagonally-packed identical SWNTs of (6,6) armchair 
type. Each SWNT (156 atoms) has a C— C diameter of 
8.2 A and length 14.7 A. On both sides of the SWNTs 
there is a water layer of width 18.9 A. The system 
contains 556 water molecules in^ total. The unit cell 
has dimensions 23 x 20 x 52.5 A and contains a total 
of 2292 atoms. All MD simulations were performed in 
the NpT ensemble (T = 300 K and p = 1 atm), us- 
ing periodic boundary conditions and the PME method 
for full electrostatics^. Water molecules were modeled 
as TIP3P 5 ". To facilitate the comparison between the 
PMFs obtained with different methods, the Van der 
Waals parameters of the C atoms (of type CA for ben- 
zene in the CHARMM force field)^ were changed (from 
e = 0.10 to e = 0.13 kcal/mol, and from Rq = 3.76 
to Ro = 4.81 A, respectively) to artificially increase the 
size of the potential barriers in the PMF from 0.35 to 
2 kpjT. All simulations were performed with the pro- 
gram NAMD2— , with a performance of ~ 1 day/ns on 
8 CPUs of a G4 Beowulf cluster (preferred for repeated 
SMD pullings), or - 12 hours/ns on 24 CPUs (preferred 
for long EMD simulations). Just like in previously re- 
ported simulationo 44 i 45 i 47 i 50 , the initially empty SWNTs 
filled up completely with water (i.e., 5 molecules per nan- 
otube) in the first few hundreds of ns. Also, the arrange- 
ment of the SWNTs prevented water molecules from en- 
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FIG. 3: (Color online) (a) PMF U (z) of a water molecule 
along the z-axis of one of the SWNTs obtained through equi- 
librium MD simulations. The included snapshot illustrates 
a completely filled SWNT with five water molecules located 
about the corresponding PMF minima, (b) Comparison be- 
tween Uo(z) (thin curve) and the same PMF Uus(z) (thick 
curve) obtained from umbrella sampling. Graphics rendered 
with the program VMDi£. 



tering the space between them. 



A. PMF from equilibrium MD simulations 

The PMF U (z) [Eq. ©] was determined from a 
9 ns long EMD trajectory recorded after the system was 
equilibrated. The histogram po(z) was constructed by 
binning the z-coordinate of the O-atoms of all water 
molecules. No visible change in the normalized distri- 
bution po(z) could be noticed when the first 7 ns part 
of the EMD trajectory was used to build it, indicating 
that the sampling was complete. Inside the SWNTs (see 
Fig-El 1 ) U-o(z) has five equidistant minima (water binding 
sites) with separation distance 2.8 A and almost identi- 
cal potential barriers of height 2 IcbT. It is convenient 
to label these minima from 1 to 5 along the positive z- 
direction. On both sides, moving away from the SWNTs 
into the bulk water the PMF exhibits three more minima 
(labeled 0, —1 , —2 and 6, 7, 8, respectively) before it flat- 
tens out. Water molecules to move in an out the SWNTs 
[i.e., to hop between minima (0, 1) and (5,6)] must over- 
come roughly the same energy barrier as the ones located 
inside the tubes. However, there is a strong spatial in- 
homogeneity of the water distribution right outside the 
nanotubes that is related to the large asymmetry of the 
energy barrier connecting minima (—1,0) and (6,7), re- 
spectively. The PMF profile is reflected by the snapshot 



of the water molecules in Fig. and is compatible with 
the observation that single-file water transport through 
SWNTs usually occurs in unidirectional bursts. We have 
also determined the PMF, U U sU), inside the SWNTs 
by using umbrella sampling and WHAM, as described in 
Sec. Ill Dl A total of six sampling windows were used. 
For convenience, these were centered, by means of HGPs 
with k = 1 .2 kcal/mol • A , on the six maxima within the 
SWNTs of Uo(z). The samplings of the biased systems 
were carried out through 5 ns long EMD simulations. To 
speed up the computation, the HGPs in the four SWNTs 
were centered on different maxima. Thus each EMD 
trajectory provided four biased distribution histograms 
Pi(z). The fact that these were properly sampled was 
tested by making sure that the histograms correspond- 
ing to the first 4 ns part of the EMD trajectory coin- 
cided with the one obtained from the entire trajectory. 
Finally, Uus( z ) was determined by solving the WHAM 
Eqs. i|10|) . As shown in Fig. EJd, the agreement between 
the calculated Uo(z) and Uus(z) is rather good, though 
not perfect. 

B. PMF from nonequilibrium cv-SMD pullings 

Next, by employing our new FR approach described 
in Sec. Ill Gl the PMF Ufr(z) was determined from 
a small number of fast F and R cv-SMD pullings of 
water molecules across the SWNTs. In each cv-SMD 
simulation four water molecules were pulled across the 
SWNTs (one molecule per nanotube) by applying a stiff 

(k = 10 kcal/mol -A 2 ) HGP [see Eq. |@J|] that moved with 
v = 20 A/ns along the z-axis of the nanotubes. Only four 
such pullings were performed in both F and R directions 
between the extremities of the interval z £ [—10, 10] A. 
Each cv-SMD simulation was started from an equili- 
brated configuration (in accordance with the applicabil- 
ity of Crooks' TFT) and was 1 ns long. Out of the 
4x4 = 1 6 F and R trajectories only those where retained 
for analysis in which the corresponding SWNT remained 
filled with water at all times. In several cases, once 
the pulled water molecule crossed halfway the channel 
the binding sites behind it remained unoccupied. Since 
such configurations correspond to a different free energy 
profile, such trajectories must be dropped in determin- 
ing the PMF for a completely filled SWNT. Thus, we 
ended up with 7 F and 14 R paths for calculating the 
PMF. Because we already know the "exact" PMF U (z), 
we deliberately did not choose to add more trajectories 
from extra simulations. Indeed, since in the case of large 
biomolecules one can afford only a small number of SMD 
runs, our goal here is to test the viability of the proposed 
FR method for calculating PMFs under such unfavorable 
conditions. The external work along the F and R paths, 
including the mean work Wp/ R , are shown in Fig.^Ji and 
b, respectively. Note that in order to display Wr on the 
same plot with Wp , the sign of the former needs to be re- 
versed and shifted to the origin of the latter. As shown in 
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FIG. 4: (Color online) Work along (a) forward and (b) re- 
verse SMD pullings. The mean work W F / R is shown as a 
thick solid curve, (c) Comparison between Uo(z) (thin curve) 
and U F r(z) = (W F — W R ) /2 (thick curve), obtained from 
fast forward and reverse SMD pullings. Vertical dashed lines 
indicate the extremities of SWNTs. 



Fig-Et, within the SWNTs (indicated by dashed vertical 
lines) the PMF UpR = (Wp — Wr)/2 agrees surprisingly 
well with Uo and Uus- We have checked (results not 
shown) that by increasing the pulling speed to v = 40 A 
and using a similarly small number of F and R trajec- 
tories, the quality of the obtained PMF is very similar 
to the one shown in Fig. However, in this partic- 
ular case, the higher the pulling speed the most likely 
that the SWNTs will partially empty during pulling and, 
therefore, more runs are necessary to collect a minimum 
number of paths for calculating the PMF. 

As discussed in Sec. Ill El for TFT to be valid it is nec- 
essary that the initial states of both F and R pullings be 
sampled from an equilibrium distribution. Thus, strictly 
speaking the above results using the FR method applies 
only to the two ends of the considered interval. The 
good agreement between UpR and Uo suggests that our 
method may give reliable PMFs for all values of the RC 
z in the considered interval. However, as shown next, 
it is simple to extend our FR method to cases where 
this issue may impact negatively on the determination 
of the PMF. Thus, the RC interval was divided into 40 
segments of the same length. For each division point, 
the system was equilibrate for a few hundreds of ns by 
using the same HGP centered about those points. Start- 



ing from statistically independent equilibrium configura- 
tions, 4 pullings with the same v = 10 A/ns in both F and 
R directions were carried out on each segment. None of 
the SWNT emptied during these short cv-SMD runs and, 
therefore, all trajectories were used for analysis. The re- 
sulting PMF, Ufr-4o(z), is shown in Fig.|S^. The agree- 
ment with the previously determined UpR is fairly good, 
especially inside the SWNTs. Closer inspection suggests 
that compared to the "exact" Uo, Upr_4o is not as good 
as UpR. Thus, one may conclude, that more sampling in 
the FR method does not necessarily give better results. 
Indeed, in the FR method we only need a good estimate 
of the mean F and R work, and not a complete sam- 
pling of the corresponding work distribution functions. 
However, it is very difficult to estimate how good is the 
mean work calculated from a few fast pullings. Also, we 
have calculated the PMF in the division points by us- 
ing the MLE method, as shown in Fig. [SJi. The Umle 
points fall right on the UpR_4o curve, suggesting again 
that in the FR method the quality of the sampled paths 
is more important than the optimal statistical analysis of 
the trajectories. 

In Fig. [U> the mean F and R work is plotted for both 
cases when the F/R pullings are done in one shot (thick 
curves) and on the segments separately (thin curves). 
While Wr for both cases match almost perfectly, the dif- 
ference between the corresponding mean F work is quite 
significant and most definitively is the source of discrep- 
ancy between UpR and Upr_4q. This difference may be 
due to the smaller number of F trajectories used in case 
one, or to partially emptied sites towards the ends of the 
SWNTs during the simulations along the segments that 
were not accounted for properly. However, it is worth 
noticing that the mismatch between the PMFs is less 
pronounced than for the mean F work. 

In any event, for the same SMD data, the FR method 
gives far better results than the currently used cumulant 
approximation method based on JE (see Sec. IIIFjl . In 
Fig. the PMFs determined by applying the cumulant 
approximation separately to F and R trajectories, i.e., 
Up and Ur, are compared to Upr. It is clear that both 
Up and Ur are biased in opposite directions. Appar- 
ently this behavior was recognized in previous work in 
which the PMF of a glycerol molecule in a GlpF chan- 
nel was calculated for the first time. To eliminate the 
bias from only F pullings, the authors partitioned the 
GlpF channel into 12 segments and artificially applied 
in an alternating fashion F and R pullings in adjacent 
segments. Our FR approach for determining PMFs nat- 
urally solves this biasing issue due to the invalidity of JE 
for few, fast unidirectional SMD trajectories. We also 
note that the arithmetic mean of Up and Up (Fig. 
matches rather well UpR indicating that in fact the 2nd 
cumulant correction of the work to the PMF is irrele- 
vant in the FR method, in which the mean dissipative 
work W d » is already correctly accounted for by 

combining F with R paths. 
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FIG. 5: (Color online) (a) PMF of a water molecule in a 
SWNT determined by using the FR method: channel as a 
whole (solid) and divided in 40 adjacent segments of the same 
length (dashed). The PMF at the ends of the segments ob- 
tained with the MLE method are also shown (circles), (b) 
Mean forward and reverse work for SWNTs considered as 
a single segment (thick-solid) and as 40 adjacent segments 
(thin-solid), (c) PMFs calculated within the cumulant ap- 
proximation considering only forward (thin-solid) and reverse 
(thick-dashed) pullings. The arithmetic mean of these two 
(thin-dashed) matches almost perfectly the PMF from the 
FR method (thick-solid). Vertical dashed lines indicate the 
extremities of SWNTs. 
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FIG. 6: (Color online) (a) Mean dissipative work determined 
from the FT method (thick-solid) and the cumulant approxi- 
mation applied separately to forward (thin-solid) and reverse 
(thin-dashed) pullings. (b) Validity test of JE along forward 
(solid) and reverse (dashed) processes. The PMF U in the 
corresponding dissipative work W dF / R = W F / R =p U is deter- 
mined from the FR (thick) and the cumulant approximation 
(thin) methods, respectively. 



isfied along the F/R pullings, as shown in Fig. 03 (thin 
curves). This, of course, is expected because Ury R are 
calculated based on the assumption that JE holds. The 
reality is that, in fact, JE fails to hold for both F and 
R pullings as the system departs from equilibrium. The 
reason, of course, is that paths with negative dissipative 
work (Wd < 0) that are crucial for the validity of JE 
(Eq. I|15fl ) a re n °t sampled. This is clearly illustrated 
in Fig. where (exp(— W dF y R )), plotted by using the 
correct expressions W^f/r = Wj/r =p AU (thick curves), 
decay rapidly towards zero as the system is pulled away 
from equilibrium. Clearly, the larger the deviation from 
equilibrium the less JE is satisfied. 

The position dependent D(z) can be calculated from 
the slope of W d according to Eq. (|25[) . Since the mean 
dissipative work is almost linear it is not surprising that 
the diffusion coefficient has an almost constant value D 
71 A 2 /ns. This is more than three times smaller than the 
bulk diffusion coefficient of water Dbuik ~ 250 A 2 /ns. 



C. Dissipative work and diffusion coefficient 

Next, we focus on the determination of the mean dis- 
sipative work and the corresponding diffusion coefficient. 
In Fig. [fJJi the mean dissipative work derived from the 
individual F/R pullings and from the FR method are 
plotted. As expected, W dr y R = rjp /R /2 calculated from 

the variance of Wp/ R seriously underestimate W d deter- 
mined from the FR method by using Eq. (|24c[) . This ob- 
servation has several consequences. First, the fact that 

W dF/R does not increase fast enough with the pulling D - Stochastic model of water transport in SWNTs 

distance clearly indicates that only a small region about 

Wpy R of Pf/ R (W) is sampled and not the entire work The determined U(z) = Up R (z) and D provide the 
distribution function. Second, the strongly biased PMFs input in the FPE Eq. (|26b|) for describing water trans- 
lip^ , obtained from the cumulant approximation, lead to port through SWNT on meso/macro time scales. This 
underestimated dissipative work W dT y R = W F / R =F U F / R should be regarded as a generalization of CTRW model 
that give the false impression that the JE equation is sat- of Berezhkovskii and Hummer—. In principle, by solv- 
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ing the FPE for the nonequilibrium distribution function 
p(z, t) for well defined initial and boundary conditions 
one can completely characterize the single-file transport 
of water molecules in the considered SWNTs. A detailed 
analysis along this line will be reported in another pub- 
lication. 

In the CTRW model single-file water molecules oc- 
cupy the binding sites (PMF minima) within the SWNT. 
Since they cannot pass each other, the diffusion of wa- 
ter molecules across the nanotube is brought about by 
random hops to the empty binding sites right in front or 
behind them. The waiting (or residence) time between 
two consecutive hops is a stochastic Poisson process. Be- 
sides the equidistant spacing between two adjacent sites 
a, the mean waiting time t is the defining parameter of 
the CTRW model. In terms of t the effective diffusion 
coefficient is D c g — a 2 fix. 

In our stochastic model t is identified with the mean 
first passage time& (MFPT) from one minimum (z-i., i = 

1 5) of the PMF U(z) into the adjacent one Zj , with 

j = i ± 1 , and is given by 



dxe u(x, /D(x) 



dy e 



-U(y) 



Now, the mean waiting time can be expressed as 



(30) 



(31) 



. i=1 



i=2 



In our case N = 5 and the corresponding mean waiting 
time T ~ 84 ps. Applying our stochastic model to the 
pristine SWNT considered in Ref. |4Jj| (for which the bar- 
rier height between binding sites is only 0.35 keT com- 
pared to 2 IcbT in our modified SWNTs) one obtains 
t rs 12.9 ns that compares very well with the reported 
13 ns. 

Furthermore, the effective diffusion coefficient D G ff of 
single-file water molecules in SWNTs can be defined as 



D eff = D (cT 2 /2Dt) 



(32) 



where a = 2.8 A is the mean spacing between two ad- 
jacent binding sites. D c ff describes the diffusion of ficti- 
tious particles in the absence of the PMF with the same 
mean diffusion time on a distance a as the mean wait- 
ing time t. In our case we get D e ff ~ 45 A /ns. It is 
this diffusion coefficient that can be measured from the 
well known asymptotic formula (Az 2 (t)) = 2D e fft from 
EMD simulations. Indeed, from our simulations we ob- 
tain D e fi « 48 A /ns, in very good agreement with the 
result from our stochastic model. 

Finally, one can calculate the mean permeation time 
T across the channel in two different ways: (i) as the 
MFPT from one end of the nanotube to the other, and 
(ii) as L 2 /2D eff , where L is the length of the SWNT. In 
both cases one obtains essentially the same result: T w 
1.45 ns between zi and Z5, and T' w 3.2 ns between 



zo and z& (i.e., between the binding sites right outside 
the ends of the SWNTs). The observed 12 permeations 
per nanotube in 9 ns corresponds to a permeation time 
1.38 ns that is a good estimate for T but it is considerably 
shorter than T'. Thus, even in this relatively simple case 
very long EMD simulations are needed to calculate the 
unidirectional water flux through the modified SWNTs 
by simply counting the number of full permeations of 
water molecules, reinforcing once again the value of our 
stochastic modeling approach. 



IV. CONCLUSIONS 

The potential and value of Crooks' TFT for determin- 
ing free energy profiles is becoming more apparent both 
theoretically^! and experimentally^. In this paper we 
have shown that by employing Crooks' TFT— within 
the stiff spring approximation the potential of mean force 
along a suitably chosen reaction coordinate can be deter- 
mined (at least semiquantitatively) from combining a few 
fast forward and time reversed nonequilibrium processes 
started from an equilibrium configuration and subject to 
the same evolution protocol of the reaction coordinate. 
In the proposed FR method one determines simultane- 
ously both the PMF (U) and the mean dissipative work 
(Wd) without invoking JE. In fact, JE is not even sat- 
isfied for fast F or R pullings simply because processes 
with negative dissipative work (that transiently violates 
the second law and are exponentially small in number) 
are not sampled. The FR method is based on a key 
observation involving Crooks' TFT (which is more gen- 
eral than JE) : whenever the F work distribution function 
Pp(VV) is Gaussian (e.g., in the case of the stiff-spring ap- 
proximation) then Pr (W) is also Gaussian. Furthermore, 
Pf/r (W) have the same width and are shifted by precisely 
twice the corresponding free energy difference between 
the equilibrium states connected by the F and R pro- 
cesses. Thus, both U and Wd can be readily determined 
from the mean F and R work (W F / R ). The practical suc- 
cess of the FR method stems from the fact that the mean 
work Wp/ R can be measured rather accurately from only 
a few fast F/R pullings. This also explains why previ- 
ous methods, based on the direct application of JE, fail 
to work away from equilibrium, making them inefficient 
for practical applications. Indeed, the width of P F / R (W), 
which is proportional to , cannot be determined even 
approximately from a few unidirectional pullings, unless 
these are close to equilibrium and rendering Pp/R (W) suf- 
ficiently narrow. This FR method works rather well for 
both small and large (e.g., biomolecular) systems. Al- 
though here we applied and tested the FR method in 
the context of SMD simulations, in principle this can be 
applied equally well to analyze properly designed single 
molecule experiments. 

To test its viability, we have applied the FR method 
to determine the PMF and position dependent diffusion 
coefficient of single-file water molecules in SWNTs. The 
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derived PMF was found to be in good agreement with 
the one obtained from standard EMD methods, e.g., um- 
brella sampling. In case of large biomolecular systems, 
when EMD methods become computationally unafford- 
able, the proposed FR method may provide the only 
hope for determining PMFs. In addition, the FR method 
has the unique feature that it determines simultaneously 
both the PMF and the corresponding position dependent 
diffusion coefficient. These two quantities then can be 
used in a stochastic model that permits the study of the 
dynamics of the system along the reaction coordinate on 
meso/macro time scale by retaining its microscopic spa- 
tial resolution. For example, our stochastic model pro- 
vides a generalization of the recently proposed CTRW 



model for single-file water transport in SWNTs^. 
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